Text Extraction from Web Images Based on Human Perception and Fuzzy Inference
نویسنده
چکیده
There is a significant need to extract and recognise the semantically-important text contained in images on Web pages. This paper proposes a new approach to text extraction from this special class of images. The method attempts to emulate closer than before the way humans perceive colour differences in order to differentiate between text and background regions. Pixels of similar colour (as humans see it) are merged into components and a fuzzy inference mechanism (using connectivity and colour distance features) is devised to group components into larger character-like regions.
منابع مشابه
EXTRACTION-BASED TEXT SUMMARIZATION USING FUZZY ANALYSIS
Due to the explosive growth of the world-wide web, automatictext summarization has become an essential tool for web users. In this paperwe present a novel approach for creating text summaries. Using fuzzy logicand word-net, our model extracts the most relevant sentences from an originaldocument. The approach utilizes fuzzy measures and inference on theextracted textual information from the docu...
متن کاملData Extraction using Content-Based Handles
In this paper, we present an approach and a visual tool, called HWrap (Handle Based Wrapper), for creating web wrappers to extract data records from web pages. In our approach, we mainly rely on the visible page content to identify data regions on a web page. In our extraction algorithm, we inspired by the way a human user scans the page content for specific data. In particular, we use text fea...
متن کاملIntegrating Fuzzy Inference System, Image Processing and Quality Control to Detect Defects and Classify Quality Level of Copper Rods
Human-based quality control reduces the accuracy of this process. Also, the speed of decision making in some industries is very important. For removing these limitations in human-based quality control, in this paper, the design of an expert system for automatic and intelligent quality control is investigated. In fact, using an intelligent system, the accuracy in quality control is increased. It...
متن کاملText segmentation in web images using colour perception and topological features
The research presented in this thesis addresses the problem of Text Segmentation in Web images. Text is routinely created in image form (headers, banners etc.) on Web pages, as an attempt to overcome the stylistic limitations of HTML. This text however, has a potentially high semantic value in terms of indexing and searching for the corresponding Web pages. As current search engine technology d...
متن کاملRobust Potato Color Image Segmentation using Adaptive Fuzzy Inference System
Potato image segmentation is an important part of image-based potato defect detection. This paper presents a robust potato color image segmentation through a combination of a fuzzy rule based system, an image thresholding based on Genetic Algorithm (GA) optimization and morphological operators. The proposed potato color image segmentation is robust against variation of background, distance and ...
متن کامل